海洋充满了称为浮游植物的微型微藻,它们共同负责与陆地上所有植物的光合作用。我们预测他们对变暖海洋的反应的能力取决于了解浮游植物种群的动态如何受环境条件变化的影响。研究浮游植物动力学的一种强大技术是流式细胞仪,它测量每秒成千上万个单个细胞的光学特性。如今,海洋学家能够实时收集流动的细胞仪数据,从而为他们提供了精细的分辨率,可以分配数千公里的浮游植物分布。当前的挑战之一是了解这些大小规模的变化如何与环境条件(例如养分可用性,温度,光线和洋流)有关。在本文中,我们提出了多元回归模型的新型稀疏混合物,以估计随着时间的变化浮游植物的亚群,同时识别预测这些亚种群观察到的变化的特定环境协变量。我们使用合成数据和在2017年春季在东北太平洋进行的海洋学巡游中收集的合成数据和实际观察结果证明了该方法的有用性和解释性。
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. In this work, we seek to better understand how we might characterize, detect, and design for data-model synergies. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population, a phenomenon we refer to as negative data externalities on group performance. Such externalities can arise in standard learning settings and can manifest differently depending on conditions between training set size and model size. Data externalities directly imply a lower bound on feasible model improvements, yet improving models efficiently requires understanding the underlying data-model tensions. From a broader perspective, our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
translated by 谷歌翻译
有限的公开数据可以支持恶意软件分析技术的研究。特别是,几乎没有由杜鹃/斗篷等丰富的沙盒生成的公开可用数据集。使用动态沙箱的好处是对目标机中文件执行的逼真模拟并获得该执行日志。机器可以被恶意软件感染,因此很有可能在执行日志中捕获恶意行为,从而使研究人员可以详细研究这种行为。尽管随后对日志信息的分析在工业网络安全后端被广泛介绍,但据我们所知,仅在学术界投入了有限的努力,以使用最先进的技术提高此类日志分析功能。我们使此示例数据集可用来支持设计新的机器学习方法以进行恶意软件检测,尤其是用于自动检测通用恶意行为。该数据集是在Avast软件和捷克技术大学-AI中心(AIC)之间合作的。
translated by 谷歌翻译
许多基于模型的强化学习方法(MBRL)为他们可以提供的马尔可夫决策过程(MDP)模型的准确性和学习效率提供了保证。同时,状态抽象技术允许减少MDP的大小,同时相对于原始问题保持有限的损失。因此,令人惊讶的是,在结合两种技术时,即MBRL仅观察抽象状态时,没有任何保证可用。我们的理论分析表明,抽象可以在网上收集的样本(例如在现实世界中)引入依赖性,这意味着MBRL的大多数结果不能直接扩展到此设置。这项工作的新结果表明,可以使用Martingales的浓度不平等来克服此问题,并允许将R-MAX等算法的结果扩展到以抽象为设置的算法。因此,通过抽象的模型为抽象的RL生成了第一个性能保证:基于模型的强化学习。
translated by 谷歌翻译
常规进行了视频支气管镜检查,以涉嫌癌症,监测COPD患者的肺组织活检以及在重症监护病房中澄清急性呼吸问题。复杂的支气管树中的导航尤其具有挑战性和身体要求,需要医生的长期经验。本文介绍了支气管镜视频中支气管孔的自动分割。由于缺乏易于获取的地面真相分段数据,目前,基于学习的深度方法被阻碍。因此,我们提出了一个由K均值组成的数据驱动管道,然后是基于紧凑的标记的流域算法,该算法能够从给定的深度图像中生成气道实例分割图。通过这种方式,这些传统算法是仅基于Phantom数据集的RGB图像上直接在RGB图像上训练浅CNN的弱监督。我们在两个体内数据集上评估了该模型的概括能力,这些数据集涵盖21个不同的支气管镜上的250帧。我们证明其性能与那些在体内数据中直接训练的模型相当,通过128x128的图像分辨率,对于检测到的气道分割中心的平均误差为11 vs 5像素。我们的定量和定性结果表明,在视频支气管镜检查,幻影数据和弱监督的背景下,使用基于非学习的方法可以获得对气道结构的语义理解。
translated by 谷歌翻译
为了能够在不怀疑的情况下使用人工智能(AI)在医学中,并认识到和评估其日益增长的潜力,在当前和未来的医务人员中,对该主题的基本理解是必要的。在“通过理解的信任”的前提下,我们在德国Ki校园(AI校园)项目框架内开发了创新的在线课程,这是一个自我指导的课程,它教授AI的基础知识进行分析医疗图像数据。主要目标是提供一个学习环境,以充分了解医学图像分析中的AI,以便通过积极的应用经验来克服对该主题的进一步兴趣,并可以克服对其使用的抑制。重点是医疗应用和机器学习的基础。在线课程分为连续的课程,其中包括以解释性视频的形式,以简化和实践练习和/或测验的形式进行的实践练习,以检查学习进度。在课程的第一次跑步中,参与医学生的一项调查用于定量分析我们的研究假设。
translated by 谷歌翻译
脑小血管疾病的成像标记提供了有关脑部健康的宝贵信息,但是它们的手动评估既耗时又受到实质性内部和间际变异性的阻碍。自动化评级可能受益于生物医学研究以及临床评估,但是现有算法的诊断可靠性尚不清楚。在这里,我们介绍了\ textIt {血管病变检测和分割}(\ textit {v textit {where valdo?})挑战,该挑战是在国际医学图像计算和计算机辅助干预措施(MICCAI)的卫星事件中运行的挑战(MICCAI) 2021.这一挑战旨在促进大脑小血管疾病的小而稀疏成像标记的自动检测和分割方法的开发,即周围空间扩大(EPVS)(任务1),脑微粒(任务2)和预先塑造的鞋类血管起源(任务3),同时利用弱和嘈杂的标签。总体而言,有12个团队参与了针对一个或多个任务的解决方案的挑战(任务1 -EPVS 4,任务2 -Microbleeds的9个,任务3 -lacunes的6个)。多方数据都用于培训和评估。结果表明,整个团队和跨任务的性能都有很大的差异,对于任务1- EPV和任务2-微型微型且对任务3 -lacunes尚无实际的结果,其结果尤其有望。它还强调了可能阻止个人级别使用的情况的性能不一致,同时仍证明在人群层面上有用。
translated by 谷歌翻译
可以使用X射线自由电子激光器的强脉冲和短脉冲直接通过单次相干衍射成像直接观察到自由飞行中孤立的纳米样品的结构和动力学。广角散射图像甚至编码样品的三维形态信息,但是该信息的检索仍然是一个挑战。到目前为止,只有通过与高度约束模型拟合,需要对单镜头实现有效的三维形态重建,这需要有关可能的几何形状的先验知识。在这里,我们提出了一种更通用的成像方法。依赖于允许凸多面体描述的任何样品形态的模型,我们从单个银纳米颗粒中重建广角衍射模式。除了具有高对称性的已知结构动机外,我们还检索了以前无法访问的不完美形状和聚集物。我们的结果为单个纳米颗粒的真实3D结构确定以及最终的超快纳米级动力学的3D电影开辟了新的途径。
translated by 谷歌翻译